-
Couldn't load subscription status.
- Fork 92
[POST GuideLLM Refactor] Multi-Turn Rework #374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: features/refactor/base-draft
Are you sure you want to change the base?
[POST GuideLLM Refactor] Multi-Turn Rework #374
Conversation
Signed-off-by: guangli.bao <[email protected]>
Signed-off-by: psydok <[email protected]>
src/guidellm/scheduler/objects.py
Outdated
| @SchedulerMessagingPydanticRegistry.register() | ||
| class ScheduledRequestAugmentation(StandardBaseModel): | ||
| """ | ||
| Adjustments to scheduler logic for a paired request. | ||
| """ | ||
|
|
||
| post_requeue_delay: float = Field( | ||
| description=( | ||
| "Delay in seconds to wait after a request to " | ||
| "queue the next request in the conversation." | ||
| ), | ||
| default=0.0, | ||
| ) | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a part of ScheduledRequestInfo but I figured it was not really necessary to pass it back and forth with each request update. Plus I thought this would be a good interface for adjusting the scheduling of individual requests.
src/guidellm/backends/openai.py
Outdated
| def _apply_history( | ||
| self, | ||
| request: GenerationRequest, | ||
| history: HistoryT[GenerationRequest, GenerationResponse], | ||
| ) -> GenerationRequest: | ||
| """ | ||
| Apply conversation history to the current request. | ||
| """ | ||
|
|
||
| def turn_to_text(turn: tuple[GenerationRequest, GenerationResponse]) -> str: | ||
| req, res = turn | ||
| return f"{req.content}{res.value}" | ||
|
|
||
| request.content = "".join(chain(map(turn_to_text, history), (request.content,))) | ||
| return request | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Temporary hack until we land request templates.
## Summary <!-- Include a short paragraph of the changes introduced in this PR. If this PR requires additional context or rationale, explain why the changes are necessary. --> Final pieces needed for image CI work. Fully enables auto `latest`, `stable` tags and old image pruning. ## Details <!-- Provide a detailed list of all changes introduced in this pull request. --> - Add `pipefail` to list-tags command to catch failures - Add missing `ghcr.io/` to skopeo commands - Disable dry-run option for development image cleanup job ## Test Plan Ran with `workflow_dispatch` [see here](https://github.com/vllm-project/guidellm/actions/runs/18108553536) <img width="2032" height="955" alt="2025-09-29T15-45-39" src="https://github.com/user-attachments/assets/b981ab01-fe90-4e15-bf60-cb483508065e" /> <img width="1204" height="579" alt="2025-09-29T15-46-02" src="https://github.com/user-attachments/assets/68118168-2e80-4d45-92cc-47badc1caf16" /> --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) --------- Signed-off-by: Samuel Monson <[email protected]>
4c4ea5d to
aa81de8
Compare
9ae0532 to
cd43b2c
Compare
…icated combinations
## Summary It's inconvenient to look at metrics. ## Details - ## Test Plan - code launch ## Related Issues - Resolves ##371 --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)
## Summary <!-- Include a short paragraph of the changes introduced in this PR. If this PR requires additional context or rationale, explain why the changes are necessary. --> <img width="1757" height="1212" alt="image" src="https://github.com/user-attachments/assets/fbfddeac-ca56-40c0-b7ae-d2f17d50823a" /> ## Details <!-- Provide a detailed list of all changes introduced in this pull request. --> - [ ] ## Test Plan <!-- List the steps needed to test this PR. --> - ## Related Issues <!-- Link any relevant issues that this PR addresses. --> - Resolves # --- - [ ] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)
## Summary With the default path referring to the versioned build now, users will no longer experience their html reports breaking randomly when the build files are updated. Also fixed versioned build directory path issue that I missed previously --------- Signed-off-by: dalthecow <[email protected]>
## Summary We want to use ITL instead of TPOT. The data we had previously happened to be ITL data, but all of the labels indicate that it is TPOT data. Now the code and labels reflect that it is ITL data. ## Test Plan - Everything works, tests pass, No use of TPOT in the UI --------- Signed-off-by: dalthecow <[email protected]> Co-authored-by: Samuel Monson <[email protected]>
## TODO - Docs - ~CSV arg string support~ CSV arg string now supports single bucket (see last example). Might leave it at that for now. - More validation ## Summary <!-- Include a short paragraph of the changes introduced in this PR. If this PR requires additional context or rationale, explain why the changes are necessary. --> This PR is a port of #287 to the v0.4.0 refactor branch. Adds controls for sharing one or more fixed prefixes between samples. See examples bellow. ## Details <!-- Provide a detailed list of all changes introduced in this pull request. --> Adds a `prefix_buckets` argument to the `SyntheticTextDatasetConfig`, each bucket consists of a prefix count, token count, and bucket weight. Prefix count sets the number of unique prefixes to generate for a given bucket, token count is the length of each prompt in the bucket, and bucket weight is used to calculate the proportion of requests the bucket applies to relative to the sum of all bucket weights. Here are a few examples: Here we have one bucket of 32 prefixes of length 2048. Since there are 1024 total samples each prefix will apply to 32 samples. If there is only one bucket than weight can be omitted as the bucket applies to 100% of samples. ```yaml data: prefix_buckets: - prefix_tokens: 2048 prefix_count: 32 prompt_tokens: 256 output_tokens: 256 samples: 1024 ``` In this modified version of the first example 16 of the prompts have 2048 tokens while the other 16 have 1024 tokens. ```yaml data: prefix_buckets: - prefix_tokens: 2048 prefix_count: 16 bucket_weight: 50 - prefix_tokens: 1024 prefix_count: 16 bucket_weight: 50 prompt_tokens: 256 output_tokens: 256 samples: 1024 ``` The prefix tokens of a bucket can also be 0 to disable prefixes for those samples. Here is an example where 40% of the samples have a prefix of 2048 tokens while the other 60% have no prefix. ```yaml data: prefix_buckets: - prefix_tokens: 2048 bucket_weight: 40 - prefix_tokens: 0 bucket_weight: 60 prompt_tokens: 256 output_tokens: 256 samples: 1000 ``` If only a single bucket is needed, it can be set at the top level. This make the changes backwards compatible with the previous interface and allows the CSV string format to work without parsing nested structures (at least for this use-case). ```yaml data: prefix_tokens: 128 prefix_count: 10 prompt_tokens: 256 output_tokens: 256 samples: 1000 ``` ## Test Plan <!-- List the steps needed to test this PR. --> - PR includes unit tests for all synthetic dataset changes (`pytest tests/unit/dataset`) - Scenearios in the Details section can be used against a model server with prefix caching and the cache rate can be confirmed by inspecting console output. ## Related Issues <!-- Link any relevant issues that this PR addresses. --> - Resolves #232 - Closes #287 --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [x] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [x] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) --------- Signed-off-by: Samuel Monson <[email protected]>
## Summary <!-- Include a short paragraph of the changes introduced in this PR. If this PR requires additional context or rationale, explain why the changes are necessary. --> Fix to parsing rc ref in CI --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) Signed-off-by: Samuel Monson <[email protected]>
… and chat completions pathways
## Summary <!-- Include a short paragraph of the changes introduced in this PR. If this PR requires additional context or rationale, explain why the changes are necessary. --> This is the same fix as #389 but applied to the RC workflow rather than the release workflow as was the original intent with #389. Both workflows need this change so not reverting the other one. --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) Signed-off-by: Samuel Monson <[email protected]>
## Summary <!-- Include a short paragraph of the changes introduced in this PR. If this PR requires additional context or rationale, explain why the changes are necessary. --> ## Details <!-- Provide a detailed list of all changes introduced in this pull request. --> - [ ] ## Test Plan <!-- List the steps needed to test this PR. --> - ## Related Issues <!-- Link any relevant issues that this PR addresses. --> - Resolves # --- - [ ] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) Signed-off-by: Samuel Monson <[email protected]>
## Summary <!-- Include a short paragraph of the changes introduced in this PR. If this PR requires additional context or rationale, explain why the changes are necessary. --> ## Details <!-- Provide a detailed list of all changes introduced in this pull request. --> - [ ] ## Test Plan <!-- List the steps needed to test this PR. --> - ## Related Issues <!-- Link any relevant issues that this PR addresses. --> - Resolves # --- - [ ] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) Signed-off-by: Samuel Monson <[email protected]>
Many of the quality errors are due to using the older union style, and have appeared due to the upgrade of the minimum Python version from 3.9 to 3.10 Signed-off-by: Jared O'Connell <[email protected]>
Signed-off-by: Jared O'Connell <[email protected]>
Signed-off-by: Jared O'Connell <[email protected]>
## Summary
<!--
Include a short paragraph of the changes introduced in this PR.
If this PR requires additional context or rationale, explain why
the changes are necessary.
-->
Makes the `max_tokens` request key configurable through an environment
variable per endpoint type. Defaults to `max_tokens` for legacy
`completions` and `max_completion_tokens` for `chat/completions`
## Details
<!--
Provide a detailed list of all changes introduced in this pull request.
-->
- Add the `GUIDELLM__OPENAI__MAX_OUTPUT_KEY` config option which is a
dict mapping from route name -> output tokens key. Default is
`{"text_completions": "max_tokens", "chat_completions":
"max_completion_tokens"}`
## Test Plan
<!--
List the steps needed to test this PR.
-->
-
## Related Issues
<!--
Link any relevant issues that this PR addresses.
-->
- Closes #395
- Closes #269
- Related #210
---
- [x] "I certify that all code in this PR is my own, except as noted
below."
## Use of AI
- [ ] Includes AI-assisted code completion
- [ ] Includes code generated by an AI application
- [ ] Includes AI-generated tests (NOTE: AI written tests should have a
docstring that includes `## WRITTEN BY AI ##`)
---------
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Samuel Monson <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
aa81de8 to
48d1b95
Compare
… package and CLI pathways (#414) ## Summary Changed the benchmarking entrypoint to take in an Args object which is now used to load scenarios. It enables a single source of truth in addition to being able to save the exact configurations in the report output. ## Details <!-- Provide a detailed list of all changes introduced in this pull request. --> - [ ] ## Test Plan <!-- List the steps needed to test this PR. --> - ## Related Issues <!-- Link any relevant issues that this PR addresses. --> - Resolves # --- - [ ] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)
Signed-off-by: Samuel Monson <[email protected]>
Signed-off-by: Samuel Monson <[email protected]>
Signed-off-by: Samuel Monson <[email protected]>
Signed-off-by: Samuel Monson <[email protected]>
Signed-off-by: Samuel Monson <[email protected]>
Signed-off-by: Samuel Monson <[email protected]>
Signed-off-by: Samuel Monson <[email protected]>
Signed-off-by: Samuel Monson <[email protected]>
Signed-off-by: Samuel Monson <[email protected]>
…odec (#411) ## TODO - [ ] ~~More flexible version locking in multimodal extras group~~ - Goal with this was to add locking for different torchcodec/torch versions but honestly its not worth the hassle - [x] Check for multi-modal libs being installed - [ ] More testing on `encode_audio` ## Summary <!-- Include a short paragraph of the changes introduced in this PR. If this PR requires additional context or rationale, explain why the changes are necessary. --> Replaces audio processing libraries with `torchcodec` which eliminates 19 dependencies and brings us inline with what HuggingFace `datasets` is doing. ## Details <!-- Provide a detailed list of all changes introduced in this pull request. --> - ## Test Plan <!-- List the steps needed to test this PR. --> - Run against audio server with ```bash guidellm benchmark run \ --target http://localhost:8000 \ --profile "synchronous" \ --max-requests 20 \ --request-type "audio_transcriptions" \ --data "openslr/librispeech_asr" \ --data-args '{"name": "clean", "split": "test"}' ``` --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [x] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)
Signed-off-by: Samuel Monson <[email protected]>
Signed-off-by: Samuel Monson <[email protected]>
Signed-off-by: Jared O'Connell <[email protected]>
Signed-off-by: Jared O'Connell <[email protected]>
Signed-off-by: Jared O'Connell <[email protected]>
Signed-off-by: Jared O'Connell <[email protected]>
## Summary <!-- Include a short paragraph of the changes introduced in this PR. If this PR requires additional context or rationale, explain why the changes are necessary. --> Adds a `tox` env for updating the lock file. Also allows args for mypy env. --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)
## Summary Various type fixes with the goal of not breaking anything. --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [x] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)
## Summary TODO ## Details TODO ## Test Plan TODO ## Related Issues TODO
Signed-off-by: Samuel Monson <[email protected]>
## Summary <!-- Include a short paragraph of the changes introduced in this PR. If this PR requires additional context or rationale, explain why the changes are necessary. --> Turns the `guidellm[multimodal]` extras group into `guidellm[audio]` and `guidellm[vision]`. --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)
Signed-off-by: Samuel Monson <[email protected]>
Signed-off-by: Samuel Monson <[email protected]>
Signed-off-by: Samuel Monson <[email protected]>
## Summary <!-- Include a short paragraph of the changes introduced in this PR. If this PR requires additional context or rationale, explain why the changes are necessary. --> Install all extras in the container and add `ffmpeg`. --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)
cd43b2c to
9669983
Compare
Summary
Details
Test Plan
Related Issues
Use of AI
## WRITTEN BY AI ##)